Comments for MEDB 5501, Week 4

Bad quiz question

A research paper computes a p-value of 0.45. How would you interpret this p-value?

  1. Strong evidence for the null hypothesis
  2. Strong evidence for the alternative hypothesis
  3. Little or no evidence for the null hypothesis
  4. Little or no evidence for the alternative hypothesis
  5. More than one answer above is correct.
  6. I do not know the answer.

P-values

  • Most commonly reported statistic
    • Also sharply criticized
    • Requires a research hypothesis
  • Two alternatives
    • Confidence intervals
    • Bayesian analysis
  • What to do when no research hypothesis

What is a population?

  • Population: a group that you wish to generalize your research results to. it is defined in terms of
    • Demography,
    • Geography,
    • Occupation,
    • Time,
    • Care requirements,
    • Diagnosis,
    • Or some combination of the above.

Example of a population

All infants born in the state of Missouri during the 1995 calendar year who have one or more visits to the Emergency room during their first year of life.

What is a sample?

  • Sample: subset of a population.
  • Random sample: every person has the same probability of being in the sample.
  • Biased sample: Some people have a decreased probability of being in the sample.
    • Always ask “who was left out?”

An example of a biased sample

  • A researcher wants to characterize illicit drug use in teenagers. She distributes a questionnaire to students attending a local public high school
  • (in the U.S. high school is grades 9-12, which is mostly students from ages 14 to 18.)
  • Explain how this sample is biased.
  • Who has a decreased or even zero probability of being selected.

Type your ideas in the chat box.

Fixing a biased sample

  • Redfine your population
    • Not all teenagers,
      • but those attending public high schools.

What is a parameter?

  • A parameter is a number computed from a sample.
    • Examples
      • Average health care cost associated with the 29,637 children
      • Proportion of these 29,637 children who died in their first year of life.
      • Correlation between gestational age and number of ER visits of these 29,637 children.
    • Designated by Greek letters (\(\mu\), \(\pi\), \(\rho\))

What is a statistic?

  • A statistic is a number computed from a sample
    • Examples
      • Average health care cost associated with 100 children.
      • Proportion of these 100 children who died in their first year of life.
      • Correlation between genstational age and number of ER visits of these 100 children.
    • Designated by non-Greek letters (\(\bar{X}\), \(\hat{p}\), r).

What is Statistics?

  • Statistics
    • The use of information from a sample (a statistic) to make inferences about a population (a parameter)
      • Often a comparison of two populations

Break

  • What have you just learned?
    • Populations, samples, parameters, statistics
  • What is coming next?
    • The null and alternative hypotheses

What is the null hypothesis?

  • The null hypothesis (\(H_0\)) is a statement about a parameter.
  • It implies no difference, no change, or no relationship.
    • Example
      • \(H_0:\ \mu = C\) (some constant)
      • Hypothesis involving proportions covered later

What is the alternative hypothesis?

  • The alternative hypothesis (\(H_1\) or \(H_a\)) implies a difference, change, or relationship.
    • Examples
      • \(H_1:\ \mu \ne C\)

Hypothesis in English instead of Greek

  • Only statisticians like Greek letters
    • Translate to simple text
    • For mean and proportion comparisons
      • Safer, more effective
    • For correlations
      • Trend, association

One-sided alternatives

  • Examples
    • \(H_1:\ \mu \gt C\) or
    • \(H_1:\ \mu \lt C\)
  • Changes in only one direction expected
  • Changes in opposite direction uninteresting

Passive smoking controversy

  • EPA meta-analysis of passive smoking
    • Criticized for using a one-sided hypothesis
    • Samet JM, Burke TA. Turning science into junk: the tobacco industry and passive smoking. Am J Public Health. 2001;91(11):1742–1744.

Break

  • What have you just learned?
    • The null and alternative hypotheses
  • What is coming next?
    • Decision rules, Type I and II errors

What is a decision rule? (Example)

  • \(H_0:\ \mu = C\)
  • \(H_1:\ \mu \ne C\)
  • t = (\(\bar{X}-C\)) / se
  • Accept \(H_0\) if t is close to zero.
    • \(-2 < t < 2\) or
    • \(-Z_{\alpha/2} < t < Z_{\alpha/2}\) or
    • \(-t_{\alpha/2; n-1} < t < t_{\alpha/2; n-1}\)

What is a Type I error?

  • A Type I error is rejecting the null hypothesis when the null hypothesis is true
    • False positive
    • Example involving drug approval: a Type I error is allowing an ineffective drug onto the market.
  • \(\alpha\) = P[Type I error]

What is a Type II error?

  • A Type II error is accepting the null hypothesis when the null hypothesis is false.
    • False negative result
    • Usually computed at MCD
    • An example involving drug approval: a Type II error is keeping an effective drug off of the market.
  • \(\beta\) = P[Type II error]
  • Power = \(1-\beta\)

Break

  • What have you just learned?
    • Decision rules, Type I and II errors
  • What is coming next?
    • p-values

What is a p-value?

  • Let t = (\(\bar{X}-C\)) / se
  • p-value = Prob of sample result, t, or a result more extreme,
    • assuming the null hypothesis is true
  • Small p-value, reject \(H_0\)
  • Large p-value, accept \(H_0\)

Alternate interpretations

  • Consistency between the data and the null
    • Small value, inconsistent
    • Large value, consistent
  • Evidence against the null
    • Small, lots of evidence against the null
    • Large, little evidence against the null

What the p-value is not (1/2)

  • A p-value is NOT the probability that the null hypothesis is true.
    • P[t or more extreme | null] is different than
    • P[null | t or more extreme]
      • P[null] is nonsensical
      • \(\mu\) is an unknown constant (no sampling error)

What the p-value is not (2/2)

  • Not a measure FOR either hypothesis
    • Little evidence against the null \(\ne\) lots of evidence for the null
  • Not very informative if it is large
    • Need a power calculation, or
    • Narrow confidence interval
  • Not very helpful for huge data sets

Pop quiz, revisited

A research paper computes a p-value of 0.45. How would you interpret this p-value?

  1. Strong evidence for the null
  2. Strong evidence for the alternative
  3. Little or no evidence for the null
  4. Little or no evidence for the alternative
  5. More than one answer above is correct.
  6. I do not know the answer.

Break

  • What have you just learned?
    • p-values
  • What is coming next?
    • Criticisms of p-values and hypothesis testing

Figure 1: xkcd cartoon about jelly beans and cancer

What is p-hacking?

  • Abuse of the hypothesis testing framework.
    • Run multiple tests on the same outcome
    • Test multiple outcome measures
    • Remove outliers and retest
  • Defenses against p-hacking
    • Bonferroni
    • Primary versus secondary
    • Published protocol

Criticisms of hypothesis testing (1 of 4)

  • Criticisms of the binary hypothesis
    • Dichotomy is simplistic
    • Point null is never true
    • Cannot prove the null
  • Possible remedy
    • \(H_0 \ C-\Delta \le \ \mu \le C+\Delta\)
    • \(H_1 \ \mu \lt C-\Delta\) or \(\mu \gt C+\Delta\)

Criticisms of hypothesis testing (2 of 4)

  • Criticisms of the p-value
    • Not intuitive, easily misunderstood
    • “results more extreme”
    • Ignores clinical importance
    • Does not measure uncontrolled biases

Criticisms of hypothesis testing (3 of 4)

  • General criticisms
    • Too hard to reject H0
    • Too easy to reject H0
    • Too reliant on a single study
    • Thoughtless application

Criticisms of hypothesis testing (4 of 4)

Figure 2: Figure 2. Cartoon showing interpretation of various p-values

What should you do if you do not have a hypothesis to test?

  • Descriptive statistics
    • Include confidence intervals
  • Qualitative data analysis

Break

  • What have you just learned?
    • Criticisms of p-values and hypothesis testing
  • What is coming next?
    • Review confidence intervals

Standardization of a statistic

General form of standardization

  • \(Z\ or\ t = \frac{statistic-parameter}{se(statistic)}\)

Specific standardization for the mean

  • \(Z\ or\ t = \frac{\bar{X}-\mu}{se(\bar{X})}\)

Convert this to a confidence interval

  • \(P[-t(\alpha/2; n-1) < \frac{\bar{X}-\mu}{se(\bar{X})} < t(\alpha/2; n-1)] = 1-\alpha\)

  • \(P[-t(\alpha/2; n-1)se(\bar{X}) < \bar{X}-\mu < t(\alpha/2; n-1)se(\bar{X})] = 1-\alpha\)

  • \(P[\bar{X}-t(\alpha/2; n-1)se(\bar{X}) < \mu < \bar{X} + t(\alpha/2; n-1)se(\bar{X})] = 1-\alpha\)

If n > 30

  • \(P[\bar{X}-Z(\alpha/2)se(\bar{X}) < \mu < \bar{X} + Z(\alpha/2)se(\bar{X})] \approx 1-\alpha\)

Intepretation

If n < 30, we have 1-\(\alpha\) level of confidence that the population mean lies between

  • \(\bar{X}-t(\alpha/2; n-1)se(\bar{X})\) and
  • \(\bar{X}+t(\alpha/2; n-1)se(\bar{X})\)

If n > 30, we have 1-\(\alpha\) level of confidence that the population mean lies between

  • \(\bar{X}-Z(\alpha/2)se(\bar{X})\) and
  • \(\bar{X}+Z(\alpha/2)se(\bar{X})\)

Other forms of the confidence interval

  • Simpler (too simple?)
    • \(\bar{X}-2\ se(\bar{X})\) and
    • \(\bar{X}+2\ se(\bar{X})\) and
  • Use \(t(\alpha/2; n-1)\) even if n > 30
  • Do not use these alternate forms for your homework.

Figure 3: Excerpt from Mondal 2023

Figure 4: Table 1 from Mondal 2023

Confidence interval for Ease Score

  • Calculations
    • \(se(\bar{X})=\frac{8.23}{\sqrt{14}}=2.19956\)
    • \(t(0.025, 13) = 2.16\)
    • \(46.94 - 2.16 \times 2.19956=42.18895\)
    • \(46.94 + 2.16 \times 2.19956=51.69105\)
  • We are 95% confident that the population mean reading age of ChatGPT education guides is between 42 and 52.

Confidence interval for overall similarity

  • Calculations
    • \(se(\bar{X})=\frac{11.46}{\sqrt{14}}=3.062814\)
    • \(t(0.025, 13) = 2.16\)
    • \(27.07 - 2.16 \times 7.234762=20.45432\)
    • \(27.07 + 2.16 \times 3.062814=33.68568\)
  • We are 95% confident that the population mean similarity is between 20% and 34%.

Effect sizes

  • Cohen’s d = \(\frac{\bar{X}-C}{se(\bar{X})}\)
  • Useful for
    • Systematic overviews
    • Intermediate calculation in sample size formulas
  • Critcisms
    • Clinical relevance requires units of measure
    • Small, medium, large are arbitrary labels

Break

  • What have you just learned?
    • Review confidence intervals testing
  • What is coming next?
    • SPSS examples

Baseball data dictionary (1/2)

This file was downloaded from the DASL (Data and Story Library) website. There are no details about who created the data set or what permissions are allowed. Educational uses of this data are probably allowed under the Fair Use provisions of U.S. Copyright Law.

This is a tab delimited data file. There are 50 rows and 2 columns of data.

The first variable is the sample number (1 to 50). The second variable is the circumference of the baseball in inches. The variable names are included at the top of the data.

Baseball data dictionary (2/2)

The first variable is the sample number (1 to 50). The second variable is the circumference of the baseball in inches. The variable names are included at the top of the data.

The standard sized baseball, according to Wikipedia and other sources on the Internet is 9 to 9.25 inches. There are no missing values in this data set.

This data dictionary was written by Steve Simon on 2023-09-10 and is placed in the public domain.

Please be sure to skip past this documentation while importing the data.

Figure 5: SPSS import dialog box

Figure 6: SPSS one-sample t-test dialog box

Figure 7: SPSS one-sample t-test output (1/3)

\(\ \)

Check that \(\frac{0.049415}{\sqrt{50}}=0.006988\).

Approximate confidence interval

  • Note that \(se(\bar{X}) \approx 0.007\)
    • 9.118 - 0.014 = 9.104
    • 9.118 + 0.014 = 9.132

Figure 8: SPSS one-sample t-test output (2/3)

\(\ \)

Check that \(\frac{9.11754-9.125}{0.006988}=-1.068\).

Converting the SPSS confidence interval

  • We are 95% confident that \(\mu-9.125\) is between -.02150 and 0.00658.
    • Add 9.125 to both sides.
    • -0.02150 + 9.125 = 9.10350
    • 0.00658 + 9.125 = 9.13158
  • Always round at the end.
    • We are 95% confidence that the population mean circumference is between 9.10 and 9.13.

Figure 9: SPSS one-sample t-test output (3/3)

Check that \(\frac{9.11754-9.125}{0.049415}=-0.151\).

BMI data dictionary (1/2)

This file is included as part of the base package of R and was converted by Steve Simon to a text file. There are no details about who created the data set. The code for R is published under an open source license, and the datasets included with R are presumably covered by the same license.

This is a tab delimited data file. There are 15 rows and 3 columns of data.

BMI data dictionary (1/2)

The first variable is the sample number (1 to 15). The second variable is the height of an adult female (inches). The third variable is the weight (pounds). The variable names are not included at the top of the data.

This data dictionary was written by Steve Simon on 2023-09-10 and is placed in the public domain.

Please be sure to skip past this documentation while importing the data.

Figure 10: SPSS data after importing

Figure 11: SPSS data after variable name change

Converting height to meters

Figure 12: SPSS dialog box for converting from inches to meters

Figure 13: SPSS dialog box for converting from pounds to kilograms

Figure 14: SPSS dialog box for BMI

Select Analyze | Compare Means and Proportions | One-sample T Test from the SPSS menu.

Figure 15: SPSS dialog box for one-sample t-test

Output from a one-sample t-test (1/3)

Figure 16: SPSS output from one-sample t-test (1/3)

Figure 17: SPSS output from one-sample t-test (2/3)

Figure 18: SPSS output from one-sample t-test (3/3)

Summary

  • What have you learned?
    • Populations, samples, parameters, statistics
    • The null and alternative hypotheses
    • Decision rules, Type I and II errors
    • p-values and criticisms
    • Confidence intervals
    • SPSS examples